GENOMIC SELECTION Genetic Mapping and Genomic Selection Using Recombination Breakpoint Data

نویسنده

  • Shizhong Xu
چکیده

The correct models for quantitative trait locus mapping are the ones that simultaneously include all significant genetic effects. Such models are difficult to handle for high marker density. Improving statistical methods for high-dimensional data appears to have reached a plateau. Alternative approaches must be explored to break the bottleneck of genomic data analysis. The fact that all markers are located in a few chromosomes of the genome leads to linkage disequilibrium among markers. This suggests that dimension reduction can also be achieved through data manipulation. High-density markers are used to infer recombination breakpoints, which then facilitate construction of bins. The bins are treated as new synthetic markers. The number of bins is always a manageable number, on the order of a few thousand. Using the bin data of a recombinant inbred line population of rice, we demonstrated genetic mapping, using all bins in a simultaneous manner. To facilitate genomic selection, we developed a method to create user-defined (artificial) bins, in which breakpoints are allowed within bins. Using eight traits of rice, we showed that artificial bin data analysis often improves the predictability compared with natural bin data analysis. Of the eight traits, three showed high predictability, two had intermediate predictability, and two had low predictability. A binary trait with a known gene had predictability near perfect. Genetic mapping using bin data points to a new direction of genomic data analysis. QUANTITATIVE trait loci (QTL) can be mapped to chromosome regions, due to the discovery of molecular markers. Early studies had few and widely spaced markers, leading to poor estimation of QTL effects. Lander and Botstein’s (1989) interval mapping has revolutionized genetic mapping and made it possible to locate QTL in intervals between observed markers. Increased marker density, along with increased sample size, can further increase the resolution of QTL mapping (Wright and Kong 1997). We are now in a situation that is opposite to interval mapping: we need to delete markers with the same information content. A genome is easily saturated with a few million SNPs and, as such, interval mapping is no longer required. One can simply analyze markers one at a time and scan the entire genome for significant markers. This type of one-dimensional marker analysis does not present a computational challenge. However, the approach is technically flawed if there are more than one QTL in the genome. Various modifications of the one-dimensional scan have been proposed, such as the composite-interval mapping (CIM) procedure (Jansen and Stam 1994; Zeng 1994). The goal of CIM is to estimate one major QTL that is detectable and, at the same time, to correct effects from other major QTL (detectable) and the “polygenic effects” that are not detectable. The CIM method also faces a new challenge regarding how to choose the cofactors to capture the background information. The results are often unstable because different markers selected as cofactors can lead to different results. A better approach of QTL mapping has been the multipleinterval mapping (MIM) procedure (Kao et al. 1999), in which all intervals are included as candidate regions and the actual QTL-associated intervals are searched via a stepwise regression analysis. When the marker density is too high, the number of intervals can be huge, presenting a great computational problem for the method. Therefore, the MIM method, in its original form, is no longer the best option. If one evaluates only a fixed number of positions in the genome, the model dimension will not change as the marker density increases. In this case, high-density markers will further reduce the uncertainty of genotype inferences for the positions evaluated. The model dimension will increase as the number of evaluated positions increases. However, the model dimension cannot be larger than the sample size, Copyright © 2013 by the Genetics Society of America doi: 10.1534/genetics.113.155309 Manuscript received July 12, 2013; accepted for publication August 14, 2013 Supporting information is available online at http://www.genetics.org/lookup/suppl/ doi:10.1534/genetics.113.155309/-/DC1. Address for correspondence: Department of Botany and Plant Sciences, University of California, Riverside, CA 92521. E-mail: [email protected] Genetics, Vol. 195, 1103–1115 November 2013 1103 which is due to the intrinsic limitation of the maximumlikelihood method. The Bayesian method is a better alternative to the MIM procedure (Satagopan et al. 1996; Sillanpää and Arjas 1998, 1999). One major advantage of the Bayesian method is the ability to assign informative prior distribution to QTL parameters, especially QTL effects. An informative prior will penalize large estimated effects and thus shrink estimated QTL effects toward zero. The consequence of using shrinkage priors is the ability to handle high-dimensional models. The MCMC-implemented Bayesian methods involve changes in model dimension, which presents another challenge because the Markov chains often take a long time to converge. In addition, the computational complexity increases when we have to manage millions of markers. Meuwissen et al. (2001) adopted a new Bayesian method with a fixed model dimension to evaluate the entire genome, using high-density SNP markers. Their purpose was not to detect QTL, but rather to predict breeding values, a new form of marker-assisted selection. Their work was not well recognized until recently when high-density markers became widely available in many organisms. The approach is known as “genomic selection” and has become very popular in animals and plants (Hayes et al. 2009; Heffner et al. 2009) as well as in humans (Yang et al. 2010) and laboratory animals (Ober et al. 2012). Xu (2003) and Wang et al. (2005) realized that this idea can be applied to line-crossing experiments for both QTL detection and genomic selection. In genomic selection, all genomic positions are considered, although there is some adjustment for linkage disequilibrium, such as forcing positions to be at d cM apart, where dmay be 1 or 2 (Meuwissen et al. 2001). The least absolute shrinkage and selection operator (LASSO) method (Tibshirani 1996) is an alternative Bayesian method that can achieve the same goal of handling large models but has avoided MCMC samplings. In terms of computational speed, the LASSO method implemented in the GlmNet/R program (Friedman et al. 2010) is the fastest one among all other software packages. Unfortunately, even the GlmNet/R program cannot produce satisfactory results for a model containing a few million SNPs (Hu et al. 2012). It appears that statistical approaches have reached a plateau and further studies of genetic mapping via new statistical methods alone may lead nowhere. Two research teams led by Qifa Zhang and Bin Han in China pioneered a ground-breaking work in genetic mapping (Huang et al. 2009; Xie et al. 2010; Yu et al. 2011). They used high-density SNP markers to infer recombination breakpoints and then converted the breakpoint data into bin data. All markers within a bin have the same segregation pattern. Each bin is considered a new marker. QTL mapping is then performed using the bin data. Since the number of bins in a finite population is always finite and can be substantially smaller than the original number of markers, genetic mapping using the bin data is much easier than that using the original markers. The model dimension can be substantially smaller, yet without loss of information. This is an alternative dimensional reduction technique that requires no comprehensive statistical methods. The bin data analysis is potentially more useful than the original marker analysis in detection of epistatic effects (G 3 G) and G 3 E interactions. This study aims to investigate the properties of bin data and use bin data to perform QTL mapping and genomic selection. Materials and Methods

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Pattern of Linkage Disequilibrium in Livestock Genome

Linkage disequilibrium (LD) is bases of genomic selection, genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping, parentage testing and whole genome association studies. The Particular alleles at closed loci have a tendency to be co-inherited. In linked loci this pattern leads to association between alleles in population which is known as LD. Two metr...

متن کامل

Genetic mapping and genomic selection using recombination breakpoint data.

The correct models for quantitative trait locus mapping are the ones that simultaneously include all significant genetic effects. Such models are difficult to handle for high marker density. Improving statistical methods for high-dimensional data appears to have reached a plateau. Alternative approaches must be explored to break the bottleneck of genomic data analysis. The fact that all markers...

متن کامل

The Impact of Different Genetic Architectures on Accuracy of Genomic Selection Using Three Bayesian Methods

Genome-wide evaluation uses the associations of a large number of single nucleotide polymorphism (SNP) markers across the whole genome and then combines the statistical methods with genomic data to predict the genetic values. Genomic predictions relieson linkage disequilibrium (LD) between genetic markers and quantitative trait loci (QTL) in a population. Methods that use all markers simultaneo...

متن کامل

مقایسه روش های مختلف آماری در انتخاب ژنومی گاوهای هلشتاین

Genomic selection combines statistical methods with genomic data to predict genetic values for complex traits.  The accuracy of prediction of genetic values ​​in selected population has a great effect on the success of this selection method. Accuracy of genomic prediction is highly dependent on the statistical model used to estimate marker effects in reference population. Various factors such a...

متن کامل

کارایی انتخاب ژنومی در برنامه‌های اصلاح نژاد مرغان بومی

The development of genomic selection has created new strategies in animal breeding programs. The aim of this study was to investigate the efficiency of genomic selection in breeding programs of native hens. In this study, a reference scenario with 3380 birds using pedigree and phenotypic information was simulated and the expected genetic progress was derived deterministically with the software ...

متن کامل

The Effect of Dams of Sire Path Management on Genetic and Economic Parameters in a Simulated Genomic Selection Program

A deterministic model based on the gene flow method, considering the features of Iranian Holstein cattle population, was implemented in this study to evaluate the effect of altering the number of age-classes in the dams of future sire (DS) path and the number of dams required for breeding a young bull (YB), to be evaluated as future sire, on genetic gain and resultant economic efficiency of a g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014